-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewind filehandle request bodies before retrying requests #1444
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1444 +/- ##
=======================================
Coverage 88.61% 88.61%
=======================================
Files 77 77
Lines 10563 10565 +2
=======================================
+ Hits 9360 9362 +2
Misses 1203 1203
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Well, this is quite a serious digging! Thank you @jwodder! I think we should keep the machinery available for now since IIRC there should be no performance hit, but it might still come handy happen we run into such a situation again for one reason or another. Should we close #1408 altogether with this PR or you think there would be more? |
@@ -233,6 +233,8 @@ def request( | |||
url, | |||
result.text, | |||
) | |||
if data is not None and hasattr(data, "seek"): | |||
data.seek(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could/should we may be add a test which would trigger such a case by e.g. shimming self.session.request and always failing upon first try after reading the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
responses
doesn't seem to support requests where data
is a filehandle: getsentry/responses#719
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, if no easy way to test ATM, I am ok to proceed to see it fixed and others to give it a shot. Adding release
label as well.
Unless you manage to get the error again with this patch (and the run on smaug is currently on its seventh upload without any problems), we can probably close it. |
should we just undraft and merge it? |
@yarikoptic Undrafted. |
🚀 PR was released in |
Closes #1408.
Here's what was going on: Occasionally, requests to S3 to upload a Zarr entry fail with a 500 status due to an internal error on S3's end, causing
dandi-cli
to retry the request. When it retries the request, it callssession.request()
again, passing in the same arguments as before, which include an open filehandle for reading the Zarr entry from disk. However, the filehandle was already read to the end on the initial request (the one that resulted in a 500); thus, althoughrequests
'ssuper_len()
obtains the correct value for the file's length, it then subtracts the filehandle's current position (the end of the file) from this length to get the number of bytes that would be produced by reading from the current position to the end of file: zero — and, as previously established, whensuper_len()
returns 0,requests
falls back to "chunked" transfer encoding, which S3 responds to with a 501 error about "header implies functionality not implemented" (hereafter "HIFNI").The logs for Error during upload, "A header you provided implies functionality that is not implemented" #1033, the original report of the HIFNI problem, are no longer available, so I cannot check if this was happening there as well.
This was what caused the initial HIFNI in Errors while uploading #1257, based on the logs in this comment (I didn't check the other error logs posted later in the thread).
While this patch should eliminate most instances of the HIFNI problem, it is still conceivable that the original hypothesized cause — NFS erroneously reporting filesizes as zero — could occur. @yarikoptic How much of the previously-added infrastructure for dealing with this problem should we keep around after this?